Identifying the Coding System and Language of On-line Documents on the Internet

نویسنده

  • Gen-ichiro Kikui
چکیده

This paper proposes a new algorithm that simultaneously identifies the coding system and language of a code string fetched from the Internet, especially World-Wide Web. The algorithm uses statistic language models to select the correctly decoded string as well as to determine the language. The proposed algorithm covers 9 languages and 11 coding systems used in Eastern Asia and Western Europe. Experimental results show that the level of accuracy of our algorithm is over 95% for 640 on-line documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying and Categorizing the Dimensions of Iran's Health System Response to the Covid-19 Pandemic

Background and Aim: Coinciding with the onset of Covid-19, known as Corona in Iran, there have been many scattered reactions from the Iranian health system to the management of the disease. The aim of this study is to identify and categorize the dimensions of the Iranian health system response in order to identify points that have been overlooked and ignored. The results of this study can be us...

متن کامل

یک سیستم نوین هوشمند تشخیص هویت نویسنده فارسی زبان بر اساس سبک نوشتاری - مقاله برگزیده هفدهمین کنفرانس ملی انجمن کامپیوتر ایران

The rapid development of communication by the Internet and the misuse of the anonymity embedded in the nature of online written documents have led to serious security issues. Anonymous identity of the Internet tools such as emails, blogs, and Web sites have made them target methods of interest for criminal activities. On the other hand, world social and political relations have made a great int...

متن کامل

Investigating Factors Affecting Overseas Students' Academic Achievement: a Systematic Review on International Documents

Introduction: Success in achieving scientific outcomes is of special importance for students studying overseas and inland. The aim of this study was to investigate national and international documents and evidences considering factors affecting overseas students' academic achievement in recent years. Methods: The main keywords used in this study were: academic success, scholarship, foreign stu...

متن کامل

Identifying the Effective Components on Rural Entrepreneurship Development: Case Study of Kurdistan Handicrafts

Purpose: Considering the importance of rural entrepreneurship development, the aim of this study was identifying the effective components on rural entrepreneurship development of Kurdistan handicrafts. Methodology: This study in terms of purpose was applied and in terms of implementation method was qualitative. The research population was of rural entrepreneurship development documents of the ...

متن کامل

Talent management in Handball: Identifying the factors of Engaging, Developing and Retaining talent in Handball of IRAN

The purpose of this study was to identifying the factors of engaging, developing and retaining talent in Handball of IRAN Based on the grounded theory approach. this research was an exploratory research and a qualitative nature. the data gathered through documents and interview for 15 handball experts in deep-interview and semi-structured forms. the sample selected through subjective sampling a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996